inclusion probability
Uncertainty-Aware Sparse Identification of Dynamical Systems via Bayesian Model Averaging
Kashiwamura, Shuhei, Kato, Yusuke, Kori, Hiroshi, Okada, Masato
In many problems of data-driven modeling for dynamical systems, the governing equations are not known a priori and must be selected phenomenologically from a large set of candidate interactions and basis functions. In such situations, point estimates alone can be misleading, because multiple model components may explain the observed data comparably well, especially when the data are limited or the dynamics exhibit poor identifiability. Quantifying the uncertainty associated with model selection is therefore essential for constructing reliable dynamical models from data. In this work, we develop a Bayesian sparse identification framework for dynamical systems with coupled components, aimed at inferring both interaction structure and functional form together with principled uncertainty quantification. The proposed method combines sparse modeling with Bayesian model averaging, yielding posterior inclusion probabilities that quantify the credibility of each candidate interaction and basis component. Through numerical experiments on oscillator networks, we show that the framework accurately recovers sparse interaction structures with quantified uncertainty, including higher-order harmonic components, phase-lag effects, and multi-body interactions. We also demonstrate that, even in a phenomenological setting where the true governing equations are not contained in the assumed model class, the method can identify effective functional components with quantified uncertainty. These results highlight the importance of Bayesian uncertainty quantification in data-driven discovery of dynamical models.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- North America > United States > Michigan (0.04)
- North America > United States > California > Santa Clara County > Sunnyvale (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Stability of Sequential and Parallel Coordinate Ascent Variational Inference
We highlight a striking difference in behavior between two widely used variants of coordinate ascent variational inference: the sequential and parallel algorithms. While such differences were known in the numerical analysis literature in simpler settings, they remain largely unexplored in the optimization-focused literature on variational inference in more complex models. Focusing on the moderately high-dimensional linear regression problem, we show that the sequential algorithm, although typically slower, enjoys convergence guarantees under more relaxed conditions than the parallel variant, which is often employed to facilitate block-wise updates and improve computational efficiency.
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Optimal Subsampling with Influence Functions
As the amount of data increases, the question arises as to how best to deal with the large datasets. While computational platforms such as Spark [28] and Ray [23] help process large datasets once a desired model is chosen, simply using smaller data can be a faster solution for exploratory data modeling, rapid prototyping, or other tasks where the accuracy obtainable from the full dataset is notneeded.
- North America > United States > Washington > King County > Seattle (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Middle East > Jordan (0.04)
- Education (0.46)
- Information Technology > Security & Privacy (0.46)
- Education (0.46)
- Information Technology > Security & Privacy (0.46)
Large-Scale Bayesian Causal Discovery with Interventional Data
Han, Seong Woo, Vo, Daniel Duy, Brown, Brielin C.
Inferring the causal relationships among a set of variables in the form of a directed acyclic graph (DAG) is an important but notoriously challenging problem. Recently, advancements in high-throughput genomic perturbation screens have inspired development of methods that leverage interventional data to improve model identification. However, existing methods still suffer poor performance on large-scale tasks and fail to quantify uncertainty. Here, we propose Interventional Bayesian Causal Discovery (IBCD), an empirical Bayesian framework for causal discovery with interventional data. Our approach models the likelihood of the matrix of total causal effects, which can be approximated by a matrix normal distribution, rather than the full data matrix. We place a spike-and-slab horseshoe prior on the edges and separately learn data-driven weights for scale-free and Erdős-Rényi structures from observational data, treating each edge as a latent variable to enable uncertainty-aware inference. Through extensive simulation, we show that IBCD achieves superior structure recovery compared to existing baselines. We apply IBCD to CRISPR perturbation (Perturb-seq) data on 521 genes, demonstrating that edge posterior inclusion probabilities enable identification of robust graph structures.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Pennsylvania (0.04)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- (3 more...)
We Have It Covered: A Resampling-based Method for Uplift Model Comparison
Uplift models play a critical role in modern marketing applications to help understand the incremental benefits of interventions and identify optimal targeting strategies. A variety of techniques exist for building uplift models, and it is essential to understand the model differences in the context of intended applications. The uplift curve is a widely adopted tool for assessing uplift model performance on the selection universe when observations are available for the entire population. However, when it is uneconomical or infeasible to select the entire population, it becomes difficult or even impossible to estimate the uplift curve without appropriate sampling design. To the best of our knowledge, no prior work has addressed uncertainty quantification of uplift curve estimates, which is essential for model comparisons. We propose a two-step sampling procedure and a resampling-based approach to compare uplift models with uncertainty quantification, examine the proposed method via simulations and real data applications, and conclude with a discussion.
- North America > United States (0.14)
- North America > Montserrat (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Marketing (0.67)
- Health & Medicine (0.46)
Bayesian Models for Joint Selection of Features and Auto-Regressive Lags: Theory and Applications in Environmental and Financial Forecasting
Manna, Alokesh, Ghosh, Sujit K.
We develop a Bayesian framework for variable selection in linear regression with autocorrelated errors, accommodating lagged covariates and autoregressive structures. This setting occurs in time series applications where responses depend on contemporaneous or past explanatory variables and persistent stochastic shocks, including financial modeling, hydrological forecasting, and meteorological applications requiring temporal dependency capture. Our methodology uses hierarchical Bayesian models with spike-and-slab priors to simultaneously select relevant covariates and lagged error terms. We propose an efficient two-stage MCMC algorithm separating sampling of variable inclusion indicators and model parameters to address high-dimensional computational challenges. Theoretical analysis establishes posterior selection consistency under mild conditions, even when candidate predictors grow exponentially with sample size, common in modern time series with many potential lagged variables. Through simulations and real applications (groundwater depth prediction, S&P 500 log returns modeling), we demonstrate substantial gains in variable selection accuracy and predictive performance. Compared to existing methods, our framework achieves lower MSPE, improved true model component identification, and greater robustness with autocorrelated noise, underscoring practical utility for model interpretation and forecasting in autoregressive settings.
- North America > United States > Connecticut (0.04)
- North America > United States > Texas (0.04)
- North America > United States > South Carolina (0.04)
- North America > United States > North Carolina (0.04)
- Retail (1.00)
- Information Technology > Services (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- (4 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Prediction-Powered Inference with Inverse Probability Weighting
Datta, Jyotishka, Polson, Nicholas G.
Prediction-powered inference (PPI) is a recent framework for valid statistical inference with partially labeled data, combining model-based predictions on a large unlabeled set with bias correction from a smaller labeled subset. We show that PPI can be extended to handle informative labeling by replacing its unweighted bias-correction term with an inverse probability weighted (IPW) version, using the classical Horvitz--Thompson or Hájek forms. This connection unites design-based survey sampling ideas with modern prediction-assisted inference, yielding estimators that remain valid when labeling probabilities vary across units. We consider the common setting where the inclusion probabilities are not known but estimated from a correctly specified model. In simulations, the performance of IPW-adjusted PPI with estimated propensities closely matches the known-probability case, retaining both nominal coverage and the variance-reduction benefits of PPI.
- North America > United States > Virginia (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
Variational empirical Bayes variable selection in high-dimensional logistic regression
Logistic regression involving high-dimensional covariates is a practically important problem. Often the goal is variable selection, i.e., determining which few of the many covariates are associated with the binary response. Unfortunately, the usual Bayesian computations can be quite challenging and expensive. Here we start with a recently proposed empirical Bayes solution, with strong theoretical convergence properties, and develop a novel and computationally efficient variational approximation thereof. One such novelty is that we develop this approximation directly for the marginal distribution on the model space, rather than on the regression coefficients themselves. We demonstrate the method's strong performance in simulations, and prove that our variational approximation inherits the strong selection consistency property satisfied by the posterior distribution that it is approximating.
- Asia > Middle East > Jordan (0.04)
- Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
- North America > United States > North Carolina (0.04)
- Research Report > New Finding (0.72)
- Research Report > Experimental Study (0.72)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)